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Abstract. This paper describes a method for a model-based analysis 
of clinical safety data called multivariate Bayesian logistic regression 
(MBLR) . Parallel logistic regression models are fit to a set of medically 
related issues, or response variables, and MBLR allows information 
from the different issues to "borrow strength" from each other. The 
method is especially suited to sparse response data, as often occurs 
when fine-grained adverse events are collected from subjects in studies 
sized more for efficacy than for safety investigations. A combined anal- 
ysis of data from multiple studies can be performed and the method 
enables a search for vulnerable subgroups based on the covariates in 
the regression model. An example involving 10 medically related issues 
from a pool of 8 studies is presented, as well as simulations showing 
distributional properties of the method. 

Key words and phrases: Adverse drug reactions, Bayesian shrinkage, 
drug safety, data granularity, hierarchical Bayesian model, parallel lo- 
gistic regressions, sparse data, variance component estimation. 



1. INTRODUCTION 

This paper introduces an analysis method for safe- 
ty data from a pool of clinical studies called multi- 
variate Bayesian logistic regression analysis (MBLR). 
The dependent or response variables in the MBLR 
are defined at the subject level, that is, for each sub- 
ject the response is either or 1 for each safety issue, 
depending on whether that subject has been deter- 
mined to be affected by that issue based on the data 
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available at the time of the analysis. Safety issues 
can include occurrence of specific adverse events as 
well as clinically significant lab tests or other safety- 
related measurements. The predictor variables, as- 
sumed to be dichotomous or categorical, are all as- 
sumed to be observable at the time of subject ran- 
domization. The analysis is cross-sectional rather 
than longitudinal, and does not take into account 
the variability, if any, of the length of time different 
subjects have been observed. The primary predictor 
is study Arm, assumed to be dichotomous with val- 
ues "Treatment" or "Comparator." Other subject- 
level covariates may be included, such as gender or 
age categories, or medical history variables. One fea- 
ture of the MBLR approach is that the interactions 
of treatment arm with each of the other covariates 
are automatically included in the analysis model. 
Data from a pool of multiple studies (having com- 
mon treatment arm definitions) may be included in 
the same analysis, in which case the study identifier 
would be considered a subject covariate. Analyses 
involving a pool of studies are similar in spirit to 
a full-data meta-analysis. 
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Estimation of effects involves a hierarchical Bayesi- 
an algorithm as described below. There are two pri- 
mary rationales for the Bayesian approach. First, 
data concerning safety issues are often sparse, lead- 
ing to high variability in relative rates of rare events 
among subject subgroups, and the smoothing inher- 
ent in empirical Bayes shrinkage estimates can al- 
leviate problems with estimation of ratios of small 
rates and the use of multiple post-hoc comparisons 
when encountering unexpected effects. Second, 
MBLR fits the same analytical model to each re- 
sponse variable and then allows the estimates of ef- 
fects for the different responses to "borrow strength" 
from each other, to the extent that the patterns of 
coefficient estimates across different responses are 
similar. This implies that the different safety issues 
should be medically related, so that it is plausible 
that the different issues have related mechanisms 
of causation or are different expressions of a broad 
syndrome, such as being involved in the same body 
system, or different MedDRA terms in nearby lo- 
cations in the MedDRA hierarchy of adverse event 
definitions. The goal is to assist with the problem 
of uncertain granularity of analysis. The question 
of how to classify and group adverse drug reaction 
reports can be controversial because different assign- 
ments can change the statistical significance of count 
data treatment effects, and methods and definitions 
for comparing adverse drug event rates are not well 
standardized (Dean, 2003). Sometimes the amount 
of data available for each of the related safety is- 
sues is too little for reliable comparisons, whereas 
doing a single analysis on a transformed response, 
defined as present when any of the original issues 
are present, risks submerging a few potentially sig- 
nificant issues among others having no treatment as- 
sociation. The Bayesian approach is a compromise 
between these two extremes. 

The proposed methodology is not intended to re- 
place or replicate other processes for evaluating safe- 
ty risk but rather to support and augment them. 
In spite of the formal modeling structure, its spirit 
is more a mixture of exploratory and confirmatory 
analysis, a way to get a big picture review when 
there are very many parameters of interest. The re- 
sulting estimates with confidence intervals can pro- 
vide a new approach to the problem of how to best 
evaluate safety risk from clinical studies designed to 
test efficacy. 

This paper describes the statistical model and the 
estimation algorithms used in a commercial imple- 
mentation of MBLR. There is also some discussion 



of alternate models and algorithms with reasons for 
our choices. An example analysis utilizes data from 
a set of clinical studies generously provided by an 
industry partner, and a simulation provides infor- 
mation on the statistical properties of the method. 

2. THE BAYESIAN MODEL FOR MBLR 

As with standard logistic regression, MBLR pro- 
duces parameter estimates interpretable as log odds, 
and provides upper and lower confidence bounds for 
these estimates. The method is based on the hier- 
archical Bayesian model described below. Identical 
regression models (i.e., the same predictor variables 
for different response variables) are estimated as- 
suming that the relationships being examined are 
all based on the same underlying process. The re- 
sponse variables represent issues comprising a po- 
tentially common safety problem and the underlying 
process is an adverse reaction caused by the treat- 
ment compound. The regression models are various 
examinations of relationships between subgroups de- 
fined by the covariates and the response issues. The 
Bayesian estimates of treatment-by-covariate inter- 
actions are conservative (estimates are "shrunk" to- 
ward null hypothesis values), in order to reduce the 
false alarm due to high variance in small sample 
sizes. This conservatism is a form of adjustment for 
multiple comparisons. 

It is natural to desire a comparison of MBLR with 
a more standard analysis, which, for the present pur- 
pose, means a logistic regression model where the 
estimates for the different responses are not shrunk 
toward each other, and where interactions between 
treatment and other covariates are not being esti- 
mated. However, it can often happen, with sparse 
safety data involving rare adverse events and the use 
of other predictors in addition to the treatment ef- 
fect, that standard logistic regression estimation can 
fail, because the likelihood function has no unique 
finite maximizing set of parameters. Gelman et al. 
(2008) discuss this problem, caused by what they 
call separation and sparsity, and suggest the auto- 
matic use of weakly informative prior distributions 
as a default choice for such analyses. Along the lines 
of the Gelman et al. (2008) suggestion, we will com- 
pare MBLR to a "weak Bayes" method that corre- 
sponds to setting certain variance components (that 
are estimated by MBLR) to values selected to be so 
large that the resulting estimates would be virtually 
the same as those of standard logistic regression if 
the data are not so sparse as to be unidentifiable. 
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This comparison method will be denoted regularized 
logistic regression (RLR). 

The event data can be considered as a i^-column 
matrix Y, with a row for each subject and a column 
for each issue, and where Ygk = 1 if subject s experi- 
enced issue k, otherwise. Since all subject covari- 
ates are assumed categorical, we will use a grouped- 
data approach, where there are rij subjects (i = 
l,...,m) that have identical covariates and treat- 
ment allocation in the ith. group, and where Nn^ of 
these subjects experienced issue k. 

MBLR requires the inclusion of the treatment arm 
and of one or more additional predictors in the model, 
where all predictors are categorical. 

Across the set of issues a single regression model is 
used. If there are J predictors excluding Treatment, 
and the jth predictor has gj categories, j = 1, . . . , J, 
then there will be G = ^gj subgroups analyzed. 
The model will usually have 2G + 2 — 2 J degrees of 
freedom and estimation is performed by constrain- 
ing sums of coefficients involving the same covari- 
ate to add to 0. The Bayesian methods presented 
here allow estimation in the presence of additional 
collinearity of predictors, in which case the com- 
puted posterior standard deviations would then re- 
flect the uncertainty inherent in a deficient design. 

For the ith group of subjects, the modeled prob- 
ability of experiencing issue k is Pik, where 

(1) Pik = 1/[1 + exp(-Zifc)] and where 

Zik = «Ofc + ^ ^igOLgk 
l<g<G 

(2) 

^<g<G ^ 

The G columns of X define the G dummy vari- 
ables for the J covariates, and T, is an indicator for 
the treatment status of the ith group. The values 
of OLgk {g = 0, . . . ,G; k = l, . . . , K) define the risk of 
issue k for the comparator subjects. As mentioned 
above, the sums Ylg'^gk = 0, where the sums are 
over the categories of each covariate for each k. The 
more natural quantities (aofc + oigk) are the log odds 
that a comparator subject in subgroup g will ex- 
perience issue k, g = 1, . . . ,G, averaged across the 
categories of other predictors not defined by sub- 
group g. 

Concerning treatment effects, the quantities (/3ofc + 
/3gk) are the estimated log odds ratios for the risk 
of issue k (treatment versus comparator) that sub- 
jects in group g experience, g = 1, . . . ,G, averaged 



across the categories of other predictors not defined 
by subgroup g. The sums Ylg l^gk are constrained 
just as the a's were. 

If G is large, there will be many possible subgroup 
comparisons, and, since these confidence intervals 
have not been adjusted for multiple comparisons, 
caution is advised in interpreting the largest few of 
such observed subgroup estimates. The MBLR es- 
timates of these quantities are designed to be more 
reliable in the presence of multiple comparisons be- 
cause subgroup-by-treatment interaction effects are 
"shrunk" toward in a statistically appropriate way, 
and there is also a partial averaging across issues, so 
that subgroup and treatment effects and subgroup- 
by-treatment interactions can "borrow strength" if 
there is an observed similar pattern of treatment 
and subgroup effects in most of the K issues being 
analyzed. When configuring a multivariate Bayesian 
logistic regression, the analyst should try to select 
those issues for which there is some suspicion of 
a common medical mechanism involved. If the Bayes- 
ian algorithm does not detect a common pattern of 
subgroup effects, then the Bayesian algorithm will 
perform little partial averaging across issues, be- 
cause corresponding variance component estimates 
will be large. 

The Bayesian model is a two-stage hierarchical 
prior specification: 

agk\Ag ~ N{Ag,a\), 

(3) 

k = l,...,K-g = l,...,G, 

(4) pQk\BG^N{B^,al), k = l,...,K, 
(3gk\Bgr~.N{Bg,al), 

(5) 

k = l,...,K;g = l,...,G, 

(6) Bgr^N{0,T^), g = l,...,G. 

The prior distributions of a^k, k = 1,. . . ,K, and 
oi Ag, g = 1, . . . ,G, and of Bq are assumed uniform 
within (— cx),+oo). Equations (3)-(5) embody the 
assumption that coefficients for the same predictor 
across multiple issues cluster around the predictor- 
specific values {Ai, . . . , Aq, Bq, . . . , Bq) with the de- 
gree of clustering dependent on the magnitude of 
three variances (cr^, do, o"^). If any of these vari- 
ances are near 0, there will be a tight cluster of the 
corresponding regression coefficients across the K 
responses, whereas if they are large, there may be 
no noticeable common pattern across k for predic- 
tor g. The values of qqa; correspond to the constant 
terms in the regressions, and we assume no com- 
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mon shrinkage of constant terms across issues, since 
the absolute frequencies of the issues are not being 
modeled here. 

Equation (6) embodies the assumption that the 
null hypotheses Bg = (i.e., no treatment-by-covari- 
ate interactions when averaged across responses) are 
given priority in the analyses. This is the assumption 
that helps protect against the multiple comparisons 
fallacy when searching for vulnerable covariate sub- 
groups. The value of determines how strongly to 
shrink the G prior means Bg toward in the second 
level of the prior specification. 

The four standard deviations {aA,cro,aB,T) have 
prior distributions assumed to be uniform in the 
four-dimensional cube < a, t <d. Their joint pos- 
terior distribution is approximated by a discrete dis- 
tribution for computational convenience, as described 
below. The posterior distribution of the coefficients 
{Ag,Bg,agk,l3gk} IS dcflued as a mixture of the dis- 
tributions of the coefficients conditional on the pos- 
sible values of the variance components. The method 
produces an approximate variance-covariance ma- 
trix for all the coefficients, and this also allows the 
estimation of standard deviations and confidence in- 
tervals (credible intervals) for linear combinations 
of parameters such as the quantities (/3ofc + (3gk) de- 
scribing the total treatment effect estimates for each 
subgroup. 

General discussion of hierarchical Bayesian regres- 
sion models is available in Carlin and Louis (2000), 
although the particular model (involving multiple 
responses) and estimation methods used in this pa- 
per are not discussed there. Searle, Casella and Mc- 
Culloch [(1992), Chapter 9] also discuss related meth- 
ods, including a logit-normal model somewhat sim- 
ilar to this one. 

3. ESTIMATION DETAILS 

Estimation of MBLR Parameters 

The estimation algorithm for MBLR is based on 
separate maximizations of the posterior distributions 
of the coefficients, conditional on the values of the 
variance components. Then these posterior distri- 
butions are averaged to provide an overall posterior 
distribution, where the weights in the average are 
determined by the Bayes factors for different val- 
ues of the vector of the four variance components. 
First we assume that the four standard deviations 
((TA, To, CTfi, t) are fixed and known and consider es- 
timation of the other parameters. 



Estimation of Coefficients and Prior Means 
Conditional on Prior Standard Deviations 

There are M = 2{G + 1){K + 1) — 1 such parame- 
ters: 2G + 1 prior means, {G + 1)K values agk and 
{G + 1)K values of f3gk- However, 2J{K + 1) sums 
of these parameters are defined as 0, leaving M* = 
2{G — J + 1){K + 1) — 1 dimensions for estimation. It 
is convenient to imagine that subjects are grouped 
according to unique values of their covariates and 
treatment allocation, so that the data are the sam- 
ple sizes Hi and the counts Nik {i = 1, ■ ■ ■ ,m; k = 
1, . . . , K), where i indexes m strata defined by unique 
values of covariates and treatment. The joint distri- 
bution of the parameters and the data can be rep- 
resented as 

p{Ai,...,Ag,Bo,...,Bg) 

(7) • Ylpiaok , . . . , OGfc , /3ofc , • • • , /Scfc I {A}{B}) 

k 

■pmk}\{A}{B}{a}m. 

The prior distributions of Ai,. . . ,Ag, Bq and the 
{aofc} are assumed uniform over (— oo, +00), whereas 
all the remaining parameters have prior distribu- 
tions as given in equations (3)-(6). 

Therefore, if logL is the log posterior joint distri- 
bution of all the parameters, then, up to a constant. 



21ogL: 



Y,B'g/T' + iG-J)logir' 

g>0 

Y,Y.^agk-Agf/a\ 

g>0 k 



+ {G-J)K\og{a\) 



Y,iPok-B^f/al + K\og{a, 



(8) 



3>0 k 



gk 



Bgf/al 



+ {G-J)K\og{al) 

i k 

+ {rii - Nik)\og{l - Pik)]. 

(8), the terms involving log(r^), log(o"^) and 
log(cj^) all have a factor {G — J), rather than the 
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more natural G, since there are G values {Ag} and 
{Bg}. But since they are being estimated subject 
to J constraints where subsets of them add to 0, 
the factor {G — J) is substituted, analogous to the 
way REML estimates are defined for variance com- 
ponents in a frequentist analysis. For fixed variance 
components, maximization of (8) with respect to all 
other parameters, remembering that the Pik are de- 
fined by (1) and (2), involves a relatively straight- 
forward modification of the usual logistic regression 
calculations. The prior means {Ag} and {Bg} are 
treated analogously to the coefficients {ctgk} and 
{/3gk} during the Newton-Rap hson maximization of 
logL. Each iteration involves calculation of the vec- 
tor S of M first derivatives of log L with respect to 
the parameters in (8) and the M x M Hessian ma- 
trix H of the negative second derivatives of logL. 
The initial values of aofc are log{N_^i^/{n^ — A^+fc)), 
k = 1, . . . ,K (the subscript means sum over the 
values of i), whereas the initial values of all other 
parameters are 0. 

Upon convergence of the maximization, the vari- 
ance-covariance matrix of the estimated parameters 
is assumed to be 

(9) v = V{aA,ao,aB,T) = H-\ 

[Actually, the matrix H will be singular because 
of the constraints that reduce the rank of H. The 
interpretation of (9) is as follows. Define a subset 9* 
of M* parameters out of the M- vector 6, where one 
parameter from each constrained subset has been 
omitted, but will be constrained to be equal to the 
negative of the sum of the other parameters in its 
subset. Define the M x M* matrix Z that converts 
from 6* to 6, that is, 6 = Z6*. Then (9) is interpreted 
as V = Z{Z^HZ)~^Z^. The same transformation is 
used during the Newton-Raphson maximization of 
logL. Also, in (lib) and later, the determinant of V 
is computed as the determinant of V* = {Z^HZ)~^ .] 

The computation of V as uses the assump- 
tion that the counts {Nik} are independent across 
both i and /c, conditional on the parameters. The 
occurrence of different events in the same subject 
may be connected via the parameters, but not oth- 
erwise correlated in this model. If this assumption 
is violated, the variances in V may be underesti- 
mated. Since the M parameters include both all the 
coefficients as well as their prior means, the vari- 
ances in V for any one component automatically 
include uncertainty due to correlation with all other 
components. In particular, uncertainty in the prior 
means {Ag^B^^Bg} is taken account of in the esti- 
mated posterior variances of the {agk-,l3gk} (up to 



the accuracy of the approximate multivariate nor- 
mality of the joint posterior distribution of the pa- 
rameters) . 

Accounting for Uncertainty in the Prior Standard 
Deviations 

The prior distribution of the set of possible values 
of (cTA, o"o, o"B, t) is assumed to be uniform within 
the four-dimensional cube with limits (0,(i), where 
a default value of d = 1.5 is selected as discussed 
below. A discrete search method approximates the 
posterior distribution within this cube. Before dis- 
cussing the details, consider the situation where the 
prior standard deviation vector cj) = [a a-, (yoif^BiT) is 
assumed to be one of S discrete values (l)i,(f)2, ■ ■ ■ ,(j)s- 
Denote the vector of coefficients and prior means by 

9 = {Ai,...,Ac,Bo,...,Bc,aoi,...,aGK,Poi,---, 
I^gk), and assume that the maximized logL and the 
estimated posterior mean and covariance matrix of 9 
are {logLs,9s, Vs) li cp = 4>s, s = 1, . . . , S . Then the 
marginal posterior distribution of 9, adjusting for 
uncertainty in (p, is assumed to be multivariate nor- 
mal with mean 9 and covariance matrix V, where 

(10a) 9 = Y,T^s0s, 

s 

(10b) V = Y,^s[Vs + {9s-9){9s-9Y], 

s 

and where vr^, the posterior weight given to (p = (j)s, 
s = 1, . . . , 5", is defined by 

(11a) 7rs = BFs/{BFi + --- + BFs), 

(lib) BFs = exp(logL,)Vdet(y,). 

The quantity BFg is the (relative) Bayes factor 
for the hypothesis (j) = (ps- The usual definition of 
the Bayes factor requires the integration of the joint 
likelihood over the space of all parameters not spec- 
ified by the hypothesis — in this case the space of 
all 9. Using the approximation of this likelihood as 
proportional to a multivariate normal density with 
covariance matrix Vs, and the known fact that vol- 
ume under the multivariate exponential form 
exp[—9'^{Vs)~^9/2] is proportional to the square root 
of the determinant of Vg , the definition of BFg is as 
given in (11). The approximation (11) is the stan- 
dard Laplace approximation often used for numeri- 
cal integration in Bayesian methods. However, a dif- 
ferent justification for computing (lib) in order to 
obtain estimates for variance components is given 
by the theory of h-likelihood (Lee and Nelder, 1996; 
Lee, Nelder and Pawitan, 2006; Meng, 2009). 
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Selection and computation of the values 

(</)s,7rs), s = l,...,S 

Representing the 4-dimensional naturally contin- 
uous distribution of by a set of discrete points is 
a challenge. Assuming a range of d = 1.5 for each 
element of (j) and a spacing of 0.1 would mean a grid 
of 5 = 15^ > 50,000 points, the vast majority of 
which would have values of tt^ nearly 0. Determi- 
nation of a set of just 5 = 33 points to represent 
the approximate posterior distribution of (p is per- 
formed as outlined next. A logistic transformation 
is used to convert the bounded cube (0,d)^ to the 
unbounded region where all four elements can range 
from (— oo,-|-oo) by defining 

A = (Ayi, Ao, Ab, Ar) where 

(12) aA = d/{l + e-^^), ctq = d/(l + 6"^°), 

aB = d/{l + e-^^), T = d/{l + e-^^). 

With this transformation, a uniform prior distri- 
bution on (0,d) for each a corresponds to a prior 
distribution for each A over the real line of /(A) oc 
a{X){d — o"(A)). The purpose of this transform is 
to allow simpler search procedures that don't have 
to worry about boundary constraints, as well as to 
make approximation of the posterior by a multi- 
variate normal distribution more accurate. Then the 
posterior density of A is assumed to be 

g{X) = g{XA,XoAB,K) 

(13) «/(Aa)/(Ao)/(Ab)/(AO 

• exp(log Ls) Vdet(T^, 

where logL and V in (13) are now functions of A, 
and the A's vary over (— oo,-|-oo). 

The determination of the discrete distribution {(pg, 
TTs), s = 1, . . . , 5, is a five-step process: 

Step 1: Use the method of steepest ascent to find 
the value A™*^^ that maximizes g{X) in (13). Deriva- 
tives of g are computed numerically as first differ- 
ence ratios with respect to each of the four argu- 
ments. The starting value for the search is A = (0, 0, 
0,0). 

Step 2: Construct a design of = 33 A-values by 
adding 16 points on the surface of each of two con- 
centric spheres centered at X^^^. The points on the 
inner sphere consist of 8 star points, where one com- 
ponent of A is A™*^ ± 2So and the other three com- 
ponents equal A™'^^, and 8 half- fractional factorial 
points, where all components are A™^'^ it 6q. The 
points on the outer sphere are similar to those on 



the inner sphere, except that 5o is replaced by 1.5(5o 
and the fractional factorial points are from the op- 
posite half fraction as the fractional factorial points 
on the inner sphere. The default value of 5o = 0.3 on 
the scale of A. Visualizing the geometry of the de- 
sign, if a 4-dimensional sphere has radius 1.5 times 
another, it encloses about 5 times the volume. 

Step 3: The double central composite design of 
Step 2 is centered but not scaled to the actual dis- 
tribution g{X). To find the appropriate scale factors 
in each dimension, 5 = (61,52,63,64), for a better fit- 
ting design, a quadratic response surface model is fit 
to values of log 5(A) across the S points of this initial 
design. The fitted model is 

(14) logg{X) = Co + CjXj + CjjXjXj. 

i i<j 

Now if the quadratic model fit exactly (i.e., if g 
were exactly multivariate normal), then the second- 
order coefficients Cij would specify the elements of 
the inverse of the posterior covariance matrix of A. 
Accordingly, we get what are hoped to be approx- 
imate posterior standard deviations by setting S = 
vector of square roots of the diagonal of H~^, where 
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Step 4: Next a new design like that of Step 2 is 
constructed except that the 60 used in Step 2 for all 
4 dimensions is replaced hy 6 = {61,62,63,64) from 
Step 3, so that the spheres are scaled differently in 
each dimension. The values of log g( A) are computed 
for these 32 new points and a new quadratic re- 
sponse surface is fit to this 33-point final design. 
Let the peak of this fitted surface be denoted A^*, 
which will not exactly equal A™^^, and redefine 6 = 
{61,62,63,64) by using the coefficients from the new 
quadratic response surface in (15). 

Step 5: The discrete distribution defined by {X^^\ 
g{X^'^^),s = 1, . . . , 5} as computed in Step 4 will rough- 
ly approximate the continuous distribution defined 
by ^(A), but the approximation can be improved by 
modifying the = 33 probabilities to constrain the 
4 means and 4 standard deviations of the discrete 
distribution to exactly match the values A*^* and 6 
that were computed from the response surface fit 
of Step 4. The final probabilities vr^, s = 1, . . . , S , 
are computed as the solution to the following con- 
strained optimization problem: 
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Find positive vri , . . . , vr^ that minimize the Kul- 
Iback-Leibler divergence 

ifZ = J]5(A(^))logb(AW)/7r,], 

s 

subject to the 9-dimensional constraints 

(16) 

s s 

J]v^,(A(^)-Afi*)2 = 5^ 

s 

where the last two equations are each interpreted 
as 4 constraints, one for each component of A. The 
constrained minimization problem of (16) is solved 
using the method of Lagrange multipliers combined 
with a Newton-Raphson solution of the resulting 9 
equations. 

Thus, the {tTs} used in (10) are the solution to (16) 
rather than the more direct values in (11). They dif- 
fer from (11) by incorporating the Jacobian terms 
of (13) and the further modifications needed to sat- 
isfy the constraints in (16). The values of {4>s} used 
in (10) are the back-transformations defined by (12) 
of the final S = 33 points {A^} used in Steps 4 and 5. 

Estimates Using Regularized Logistic Regression 
(RLR) 

To compare the MBLR results to standard logis- 
tic regression, and still be able to avoid problems 
with nonidentifiability, as discussed above, the RLR 
algorithm is defined by fitting MBLR under the con- 
straints 

(17) aA = 5, ao = 5, ctb = 0.001, t = 0.001. 

Setting as and r very close to effectively con- 
strains the estimates of covariate-by-treatment inter- 
actions to be 0. Setting a a and uo to be very large pre- 
vents the estimates across different response events 
from shrinking toward each other The rationale for 
thinking that a prior standard deviation of 5 is very 
large for a logistic regression coefficient is as fol- 
lows. Remembering that the coefficients are inter- 
preted as logs of odds ratios, an increase of 5 in 
a coefficient corresponds to a multiplicative factor 
of = 148.4 in an odds ratio. With respect to the 
assumed normal prior distributions in equations (3)- 
(6), the prior standard deviation of 5 implies that 
about one-third of all estimated odds ratios are ex- 
pected to be outside the range of (1/148 = 0.007, 
148). This certainly seems to be well beyond the 



range of expected odds ratios in any medical risk es- 
timation situation. See Gelman et al. (2008) for a re- 
lated discussion. [In the Bayesian setup described 
above, we use as default limits for the prior standard 
deviations (0, d = 1.5). Considering a prior standard 
deviation to be as large as 1.5, where e^'^ = 4.5, im- 
plies that about one-third of the estimated odds ra- 
tios would be outside the range of (1/4.5 = 0.22, 
4.5), which seems a bit of a stretch, but barely con- 
ceivable.] 

Using the values in (17) for the prior standard de- 
viations, this alternative weak Bayesian prior method 
estimates the parameters and their variances us- 
ing the iterative Newton-Raphson estimation de- 
scribed above. The resulting estimates are computa- 
tionally reliable even if many of the response events 
are sparse. Such estimates perform very little shrink- 
age across response models because the prior stan- 
dard deviations in equations (3)-(6) are large com- 
pared to the standard errors of the (estimable) logis- 
tic regression coefficients. However, the MBLR and 
RLR models as formulated will not protect against 
problems of estimability in case every response is 
quite sparse, because of the use of an improper prior 
for the prior means {Ai, . . . ,Ag,Bq). If certain co- 
variate or treatment categories are perfectly corre- 
lated with every response, then one must either drop 
such predictors or add additional response variables. 

The Bayes factor for 0o = (5,5,0.001,0.001) can 
be computed and compared to the 33 values found 
in the final grid of the Bayesian estimation described 
above, which provides further evidence regarding 
the prior standard deviations. In particular, large 
Bayes factors against (po imply that the MBLR model 
fits the data better than the RLR model, meaning 
that there is significant evidence that either the re- 
sponses have similar covariate profiles or that there 
are significant covariate-by-treatment interactions. 

Confidence Intervals for Odds Ratios 

Let the final estimate of, for example, f3gk be bgk, 
so that the odds ratio point estimate is ORgk = 
exp(6gfc). Using the normal approximation to the 
posterior distribution of the coefficients and the es- 
timates of V in equation (10), 90% confidence in- 
tervals (posterior credible intervals) for the corre- 
sponding odds ratios are given by 

OR.05 = exp[bgk - lMbJv{fi^)] < OR 

(18) 

< exp[6gfc + lM5^v{f3gk)] = OR.95. 
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For the main effects of covariates or for treatment, 
these provide confidence (credible) intervals for odds 
ratios of the predictor vs the response outcome. The 
odds ratio comparing two categories of a multicat- 
egory covariate would be found by taking the ratio 
of the corresponding exponentiated coefficients. 

Interpreting the interaction effects of covariates 
with treatment arm is tricky, since it would involve 
ratios of odds ratios. To aid in interpretation, one 
can present in addition to the interaction coefficients 
themselves, the sums of the treatment coefficient 
plus the interaction coefficients. Confidence intervals 
for these sums are formed in the usual way, taking 
into account the covariances between the treatment 
coefficient and the interaction coefficients. When 
these sums and their confidence limits are expo- 
nentiated, we get estimates and limits for subgroup 
treatment-by-outcome odds ratios. These estimates 
are oriented toward finding potentially vulnerable 
subgroups where the adverse effect risk of treatment 
is especially high. 

4. DISCUSSION OF METHODS AND 
ALTERNATE MODELS 

The philosophy of estimation is not to try to model 
the medical mechanisms perfectly, but to provide 
a reliable compromise between pooling related sparse 
events in order to increase the sample size, and fit- 
ting separate models to each event, with the corre- 
sponding loss of power due to small samples. The 
selection of which issues to include in an MBLR is 
important. There needs to be at least a superficial 
plausibility that all or many of the selected outcome 
issues might have similar odds ratios with treat- 
ment and with the covariates in the model, what 
Bayesians call exchangeability. Sometimes it may be 
difficult to decide what other issues to include if at- 
tention has focused primarily on a single and seem- 
ingly unique issue such as subject death. Because it 
takes several degrees of freedom to estimate a vari- 
ance component, the values of some of the standard 
deviations in equations (3)-(6) may be poorly esti- 
mated if K and/or G are not large, but the use of 
Bayes factors and the computation of the vr^ in (11) 
and (16) allow some assessment and adjustment for 
this uncertainty. 

The current model is quite similar in spirit to, and 
somewhat inspired by, that proposed by Berry and 
Berry (2004). They also assume that drug adverse 
reactions are classified into similar medical group- 
ings in order to use a shrinkage model to allow bor- 
rowing strength across similar medical events. They 



focus on treatment/comparator odds ratios only and 
do not consider covariates or the use of logistic re- 
gression. They also define a more complex model 
having many more variance components than the 
one proposed here. 

One might ask the question of why estimate co- 
variate effects at all, since in a randomized study 
the covariates should all be nearly orthogonal to the 
treatment variable? The rationale in MBLR is not so 
much to adjust for potential biases in the treatment 
main effect, but to be able to include treatment- 
by-covariate interactions in order to detect possibly 
vulnerable subgroups that might react differently to 
the treatment. When G is large (many covariate cat- 
egories) it will often be difficult to estimate so many 
parameters unless all the issues being modeled oc- 
cur frequently. The multiple comparisons involved 
make any search for vulnerable subgroups difficult 
and subject to false alarms, especially for sparse 
events. This makes the use of Bayesian shrinkage of 
the interaction terms in (6) especially valuable: it ne- 
gotiates the bias-variance trade-off among multiple 
event rates having possibly very different sampling 
variances. Without this smoothing effect, estimates 
of interactions affecting rare events will be so vari- 
able as to be useless, which is why the RLR method 
is defined to estimate only main effects. 

The importance of avoiding undue rejection of the 
null hypothesis in the presence of multiple post-hoc 
comparisons is central to being properly conserva- 
tive when evaluating treatment efficacy. There is 
a question as to how much this conservatism should 
extend to exploratory analyses of safety issues. For 
example, the prior specification (6) shrinks the inter- 
action prior means Bg toward 0, whereas the main 
effect prior means Ag and Bq are not shrunk to- 
ward 0. We prefer to maintain maximum sensitivity 
to safety main effects, while accepting that true in- 
teraction effects are less likely and need more false 
alarm protection. We also encourage parallel com- 
putation of the minimal-shrinkage regularized LR 
estimates discussed above, so that the analyst can 
perform an easy comparison and sensitivity analysis 
of the effects of shrinkage. 

The prior distributions in equations (3)-(6) are all 
assumed to be normal distributions. Many Bayesian 
researchers have pointed out that since normal dis- 
tributions generate few outliers, outliers may be cor- 
respondingly suppressed under this assumption. 
Commonly suggested alternative prior distributions 
are the double exponential and Student's t, which 
tend to shrink outliers less. The double exponential 
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Issue 


Treatment events 


Comparator events 


95% C.I. for Odds Ratio 


Anuria 


8 





(1.0 , 295.4) 


Dry mouth 


308 


65 


(3.9 , 6.7) 


Hyper kalaemia 


218 


162 


(1.1 , 1.7) 


Micturition urgency 


13 


3 


(1.2 , 12.6) 


AT J- 

JNocturia 


19 


7 


(1.1 , 6.1) 


Pollakiuria 


193 


34 


(4.1 , 8.5) 


Polydipsia 


49 


4 


(4.2 , 29.3) 


Polyuria 


100 


17 


(3.5 , 9.8) 


Thirst 


543 


66 


(7.5 , 12.6) 


Urine output increased 


13 


1 


(1.7 , 48.8) 


Subject counts: 


Treatment 


= 3110 


Comparator = 2642 



Display 1. Statistics for ten issues related to dehydration/renal function for the pooled studies. 



("lasso") prior has nonstandard theoretical proper- 
ties that make computation of standard errors of co- 
efficients problematical, and so have been ruled out 
for this application. Alternative distributions like 
Student's t are difficult to handle computationally in 
our complex situation where there are hundreds of 
coefficients and multiple variance components. The 
normal model that we use has a concave log pos- 
terior density function and the iterative estimation 
algorithm is guaranteed to converge. 

There is a similar computational feasibility ra- 
tionale for using the discrete approximation to the 
distribution of prior standard deviations. It is more 
common in the recent Bayesian literature to use 
Gibbs sampling or another Markov chain Monte 
Carlo (MCMC) method to estimate the posterior 
distributions of all parameters. Two reasons for pre- 
ferring to avoid such methods are as follows: first, 
we want to allow scientists without much statisti- 
cal sophistication, much less experience with fancy 
Bayesian computational methods, to use MBLR and 
these users would have trouble assessing convergence 
of such high-dimensional MCMC runs. Second, these 
users might also be uncomfortable with the fact that 
repeating an analysis on the same data typically 
leads to slightly, but noticeably, diff'erent answers. 
The method for handling the variance component 
estimation outlined above provides computationally 
and statistically reliable answers within a feasible 
computational burden. As described above, there 
are three roughly equally expensive stages in the 
model fitting computations: the two preparatory 
stages of finding the maximum of the posterior dis- 
tribution and then evaluating it on an initial grid to 
find scale parameters in each direction, and the last 



stage of evaluating the model on the final grid to ap- 
proximate the posterior distribution of the variance 
components. 

5. EXAMPLE ANALYSIS 
Data Description 

The data used for the example analyses are from 
a pool of eight studies, kindly contributed by an 
anonymous partner. Four of the studies were for one 
indication and four were for a second indication. 
There were a total of 5752 subjects in the pooled 
studies, 3110 in the Treatment arm and 2642 in the 
Comparator arm. 

Display 1 shows statistics from these studies for 
a set of ten issues related to dehydration and/or re- 
nal function. All ten issues show up with greater 
frequency in the treatment arm than in the com- 
parator. The final two columns are the endpoints of 
95% confidence intervals for the odds ratios compar- 
ing treatment and comparator groups in the pooled 
data, computed using a normal approximation for 
the log (odds ratio) after adding 0.5 to every cell 
of each 2x2 table. It is clear that many of these 
issues are associated with treatment, and we wish 
to investigate the commonality of these medically 
related issues, as well as the possibility that certain 
subgroups of subjects may be more or less affected 
by these associations. 

Display 2 shows the four covariates selected as 
grouping variables for this analysis: Gender, Study 
ID, Renal History and Age. Recall that of the 8 stud- 
ies being pooled, there were 4 studies for each of two 
potential indications for the drug. The Study ID val- 
ues of A1-A4 and B1-B4 distinguish the studies for 
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Treatment 


Comparator 


Gender = F 


908 


685 


Gender = M 


2202 


1957 


Studv = Al 


246 


84 


Studv = A2 


120 


120 


Studv = A3 


239 


80 


Studv = A4 


191 


63 


Studv — R1 


102 


1 03 


Studv — R2 


17 




Study = B3 


123 


120 


Study = B4 


2072 


2061 


Renal history = Y 


190 


191 


Renal history = N 


2920 


2451 


Age = 50 or under 


382 


348 


Age = 51 to 65 


1089 


902 


Age = 66 to 75 


948 


820 


Age = Over 75 


691 


572 


All patients 


3110 


2642 



Display 2. Distribution of subjects by covariates and treat- 
ment arm. 



indication A and indication B. The Renal History 
variable distinguishes those subjects whose medical 
history (before randomization) includes one or more 
renal problems. As can be seen from Display 2, there 
are many more male than female subjects, and the 
age range 51 to 75 predominates. Three of the stud- 
ies for indication A had about a 3:1 split of Treat- 
ment to Comparator subject counts, while the other 
five studies are more equally split. Study B2 had 
only 28 subjects total, while Study B4 had 4133 sub- 
jects, over two-thirds of the total in the pool. Only 
about 7% of the subjects had a previous history of 
renal problems. 

Display 1 shows that five of the ten issues af- 
fected fewer than 10 Comparator-group subjects, 
whereas there are 16 separate covariate groups in 
Display 2. This makes it unlikely that those rare is- 
sues would occur in every treatment -covariate com- 
bination, which is necessary for convergence of a stan- 
dard LR where the model includes all treatment- 
covariate interaction terms. In fact, only 3 of the 
10 issues satisfy this condition, confirming the ne- 
cessity of some special technique such as MBLR 
to try to estimate treatment-by-covariate interac- 
tions, and, in fact, even a main-effects only model 
would not be estimable by standard logistic regres- 
sion applied to the rarer of these response issues, 
making the regularized LR necessary for this exam- 
ple. 



Posterior Distributions for Prior Standard 
Deviations 

This example has K = 10, J = 4 and G = 16, with 
the total number of parameters (elements of 0) to 
estimate being M = 2{G + 1){K + 1)-1 = 373, with 
M* = 285 degrees of freedom. Display 3 shows var- 
ious results function of the four prior stan- 
dard deviations. The top row describes the reg- 
ularized LR case where aA = 5, co = 5, as = 0.001, 
r = 0.001. The rows labeled 1-33 in Display 3 show 
results for the final grid used to approximate the 
posterior distribution of (p. The row 1 values are the 
maximum posterior estimates (transformed from the 
scale of A to that of (p) estimated by the final re- 
sponse surface fit described above. Rows 2-33 show 
the remaining values of the final stage grid. In this 
example, all stages of the estimation required a to- 
tal of about 400 iterations through the data, that 
is, about 400 evaluations of (8) and its first and sec- 
ond derivatives with respect to M* = 285 parame- 
ters. 

The rightmost column in Display 3, headed "PROB, 
shows the values of 1007r%, as defined by (16). As 
discussed above, these probabilities have been ad- 
justed so that the discrete distribution of A matches 
the means and variances of the continuous distribu- 
tion of A as estimated by the response surface fit to 
the values of log (7. 

The bottom two rows of Display 3 show the pos- 
terior mean and standard deviations of the compo- 
nents of (f) using this 33-point discrete approxima- 
tion. It can be seen that the values are approxi- 
mately {a A = 0.34, 0-0 = 0.76, as = 0.15, r = 0.20). 
The value in the row marked "Mean" and the col- 
umn marked "PROB" is computed as vr^ = 0.0639, 
which is a measure of the dispersion of the probabil- 
ities TTs- The smaller it is, the more spread out are 
the probabilities among the 33 grid points. Large 
values of X^^vr^, say, values above 0.2, would imply 
that the scale or location of the grid might be poorly 
chosen, so that only a few points on the grid are very 
probable. 

Comparison of MBLR and RLR Estimates of 
Treatment Effects 

Display 4 shows estimation results for the treat- 
ment main effects for each of the two methods and 
for each response event and for the prior mean of 
all responses. The prior mean odds ratio is defined 
as exp(i?o); whereas the treatment odds ratio for 
the kth response is exp(/3ofc). For each combination 
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a A 


CTq 


as 


T 


PROB 





5.000 


5.000 


0.001 


0.001 


0.00% 


1 


0.327 


0.688 


0.161 


0.196 


15.90% 


2 


0.276 


0.505 


0.110 


0.312 


1.87% 


3 


0.276 


0.505 


0.232 


0.118 


3.02% 


4 


0.276 


0.879 


0.110 


0.118 


2.71% 


5 


0.276 


0.879 


0.232 


0.312 


4.32% 


6 


0.384 


0.505 


0.110 


0.118 


4.25% 


7 


0.384 


0.505 


0.232 


0.312 


1.83% 


8 


0.384 


0.879 


0.110 


0.312 


5.01% 


9 


0.384 


0.879 


0.232 


0.118 


1.75% 


10 


0.252 


0.423 


0.090 


0.091 


0.14% 


11 


0.252 


0.423 


0.276 


0.387 


0.42% 


12 


0.252 


0.969 


0.090 


0.387 


0.88% 


13 


0.252 


0.969 


0.276 


0.091 


0.83% 


14 


0.416 


0.423 


0.090 


0.387 


0.51% 


15 


0.416 


0.423 


0.276 


0.091 


0.10% 


16 


0.416 


0.969 


0.090 


0.091 


5.40% 


17 


0.416 


0.969 


0.276 


0.387 


0.13% 


18 


0.231 


0.688 


0.161 


0.196 


1.86% 


19 


0.448 


0.688 


0.161 


0.196 


2.13% 


20 


0.327 


0.350 


0.161 


0.196 


1.37% 


21 


0.327 


1.053 


0.161 


0.196 


6.96% 


22 


0.327 


0.688 


0.074 


0.196 


8.12% 


23 


0.327 


0.688 


0.327 


0.196 


0.75% 


24 


0.327 


0.688 


0.161 


0.070 


5.80% 


25 


0.327 


0.688 


0.161 


0.473 


3.21% 


26 


0.192 


0.688 


0.161 


0.196 


0.87% 


27 


0.518 


0.688 


0.161 


0.196 


2.43% 


28 


0.327 


0.232 


0.161 


0.196 


0.02% 


29 


0.327 


1.196 


0.161 


0.196 


4.69% 


30 


0.327 


0.688 


0.049 


0.196 


5.54% 


31 


0.327 


0.688 


0.447 


0.196 


0.00% 


32 


0.327 


0.688 


0.161 


0.041 


6.18% 


33 


0.327 


0.688 


0.161 


0.670 


1.01% 


Mean 


0.336 


0.756 


0.146 


0.196 


6.39% 


St.Dev. 


0.053 


0.183 


0.053 


0.105 






Display 3. 


Calculation summary for 


the final grid of prior 


standard deviations. 





the odds ratio and its approximate 90% confidence 
(credible) interval are shown, based on (18). Com- 
paring the MBLR to the RLR estimates, we see that 
the MBLR estimates are pulled away from the RLR 
estimates and "shrunk" toward the MBLR prior 
mean, which represents the average or typical odds 
ratio across response issues. The degree of shrinkage 
is greatest for the highest- variance RLR estimates, 
corresponding to the rare issues such as Anuria and 
Urine output increased. For these two issues, al- 



though the MBLR odds ratio estimate is smaller 
than the corresponding RLR odds ratio, but so are 
their posterior variances, so that the lower bounds 
of the MBLR intervals are greater, providing greater 
statistical significance from the null hypothesis of 
OR = 1. Even though all 8 occurrences of Anuria 
were in the treatment arm, the treatment effect does 
not show up as significant with the multiple-predictor 
RLR model — the MBLR estimate of the effect on 
Anuria seems more reasonable. 
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Odds Ratios for TERM = Treatment 

OS 2 S 10 50 200 



PRIOR MEAN RLR 
PR)OR_(iiEAN MBLR 

Anuria RLR 
Anuria MBLR 

Dry mouth RLR 
Dry mouth MBLR 

Hyperkalaemia RLR 
Q Hyperkalaemia MBLR 
O 

J M I cturi lion. urgency RLR 
uj Micturition urgency MBLR 
S 

Noctuna RLR 
K Nocturia MBLR 

p Pollakiuria RLR 

SS Pollakiuria MBLR 

^ Polydipsia RLR 

Polydipsia MBLR 

Polyuria RLR 
Polyuria MBLR 

Thirst RLR 
Thirst MBLR 

Urine output increased RLR 
Urine, output, in creased MBLR 



1000 5000 



T 1 1 1 1 1 r 



0.5 



5 10 



50 



200 



1000 5000 



Odds Ratio (90% CI) 

Display 4. Estimates of main effect of treatment by method and response variable. 



Inspection of Display 4 shows that not all of the 
MBLR confidence intervals are narrower than the 
corresponding RLR interval. The reverse is true for 
the more frequent responses such as Hyperkalaemia 
and Thirst. In these cases, the MBLR estimates do 
not benefit much from the relatively weak prior dis- 
tribution, and their posterior variances are adversely 
impacted by the uncertainty in the variance compo- 
nent estimation as well as the need to estimate all of 
the interaction parameters, which are assumed away 
by the RLR model. 

MBLR Estimates of Prior Means 

Display 5 graphs the MBLR estimates of the (ex- 
ponentiated) prior means {Ag,BQ,Bg}, with their 
90% CIs. These are interpreted as effects for a "typ- 
ical" response variable. Remembering that coeffi- 
cients for categories of each covariate must sum to 0, 
the corresponding odds ratios must average to 1 
when plotted on a log scale. The middle interval 
shows the main effect of treatment, the intervals 
above show covariate main effects, and the intervals 
below show treatment interactions. As also shown in 
Display 4, the treatment effect prior mean is about 
4.4 on the odds ratio scale, with 90% limits of (2.7, 



7.1). The main effects of covariate estimates, shown 
above the treatment line, can be thought of as the 
effects of covariate categories within the compara- 
tor arm, and as centers of shrinkage across the re- 
sponses. Thus, the rates of these events in the com- 
parator arm are somewhat less for Age:50 and un- 
der and for Renal IIistory:N. Also, Study:A2 had 
a particularly high event rate, while Study:B4 had 
a particularly low event rate. But none of these dif- 
ferences in groups based on covariates are as large 
as the treatment effect. 

The lower set of estimates in Display 5 portray the 
treatment-covariate interactions. As can be seen, 
these effects are smaller than the main covariate ef- 
fects and much smaller than the main treatment ef- 
fect. The treatment effect estimates within the four 
studies for Indication A are all larger than the four 
estimates for the Indication B studies, but the uncer- 
tainty intervals all overlap considerably. Although 
this does not rule out larger interaction effects for 
some of the response variables, the fact that a a, is 
about 0.3 and both as, and r are each less than 
0.2 means that such effects for individual responses 
are also likely to be fairly small. Since fJo is about 
0.76, there is more room for variation in treatment 
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Main Effects and Treatment Interactions for RESPONSE = PRIOR MEAN 



MBLR AQe:50 and under 
MBLR Age 61 10 65 
MBLR Ase66lo75 
MBLR Age over 75 
MBLR CendeiF 
MBLR Gender M 
MBLR RenjlHi&torv U 
MBLR Renal Hi slory Y 
MBLR StwlyAI 
MBLR Study :a2 
MBLR StudyAJ 
MBLR Stu<jy A4 
MBLR Stu<Jv:B1 
MBLR Stu<lya2 
MBLR Study :B3 
MBLR Study :B<I 
MBLR TreatrTtent 
MBLR Trt'AgeiSOandLnder 
MBLR TffAse:51 10 65 
MBLR Tn'A5e66lo75 
MBLR TifAgeover75 
MBLR TrT Gender F 
MBLR Trt'GertderM 
MBLR TrfRenslHistory:N 
MBLR TrCRenalHis4ory Y 

MBLR TrfStudy.Al 

MBLR TrVStudyA? 

MBLR Tn-StudyAS 

MBLR TfTStudy;Ail 

MBLR Tn-Stud/ B1 

MBLR Til'StudyBi 

MBLR TiTStudy.B3 

M8LR Til-study;aJ 




Odds Ratio (90% CI) 



Display 5. Estimates of PRIOR.MEAN from MBLR. 



main effect among the responses, as we also saw in 
Display 4, where the Treatment odds ratios ranged 
from 1.3 for Hyperkalaemia to 7.4 for Polydipsia. 

The prior means of the treatment by covariate 
interactions (the bottom 16 intervals of Display 5) 
have especially small posterior means, as might be 
expected given that they have been shrunk toward 
because of the small value of r, with posterior mean 
= 0.196 in Display 3. Another way of saying this is 
that the estimates of Bg were so small compared to 
their sampling variances that only a small value of r 
is compatible with these results and the assumption 
of (6). 

The estimates of prior means under the regular- 
ized LR model are less interesting. Assuming that a a 
and do are large implies that Ag and Bq cannot be 
estimated well and will thus have wide confidence 
intervals, and of course assuming no interactions 
means that the Bg = for > 0. 

Breakdown of Estimates by Study for Issue 
Pollakiuria 

Display 6 shows the MBLR 90% intervals for odds 
ratios relating to the Study ID covariate and the is- 
sue Pollakiuria (very frequent daytime urination). 



The 2x2 table information in Display 1 shows that 
this was highly associated with treatment (193:34 
split by treatmentxomparator). Our discussion fo- 
cuses on whether and how the results differ by the 
studies being pooled, and what summary conclu- 
sions are justified across studies. The goal is sim- 
ilar to meta-analysis, except that we have complete 
data from each study and so can adjust for more 
potentially biasing between-study differences. 

The top eight intervals in Display 6 show the Study 
main effects, corresponding to relative differences 
among the comparator arm odds of reporting Pol- 
lakiura within the studies. These differential esti- 
mates are adjusted for the other covariates Age, 
Gender and RenalHistory. There are relatively large 
and significant study effects, especially between 
Study A2 and Study B4, where the estimated odds 
ratio is over 8 (2.8 versus 0.34 on the horizontal 
axis), with relatively narrow 90% intervals. 

The next set of eight intervals shows the Treat- 
ment by Study interaction estimates. Although the 
differences are not as large as in the comparator 
arms, the pattern is similar, in that the studies that 
had a large base rate of Pollakiuria tended to have 
larger increases in adjusted Pollakiura rates. The 



14 



W. DUMOUCHEL 



Effects Compared Across Studies for RESPONSE = Pollakiuria 



UBLR StuOyAI 
MSLR StlKty A2 
MBLR Stutly:A3 
MBLR Stuay M 
MBLR stuayBt 
MBLR S1udyB2 
MBLR Sludy BS 
MBLR Sluav B4 
MBLR UrSluflyAt 
MBLR TffSludyA2 
MBLR Trt- Study A3 
MBLR TrfStuOyM 
MBLR TrfBluayBI 
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Display 6. MBLR estimates of odds ratios relating to the Study covariate for the response Pollakiuria. 



three studies having the largest treatment effects 
(Al, A2, A4) are all based on Indication A. These 
estimates are somewhat hard to interpret, being ra- 
tios of odds ratio estimates. The lower set of inter- 
vals return to the simple odds ratio scale by adding 
(on the log scale) the interaction estimates to the 
main effect of treatment. The very bottom interval 
shows the 90% interval for the treatment main ef- 
fect, and the central points for the eight intervals 
above it average to the center point at the bottom. 

These last 9 intervals in Display 6 are reminis- 
cent of the way a meta-analysis is often presented in 
a "ladder plot," with estimates of effect for each study, 
and followed by a combined treatment estimate at 
the bottom. However, there are certain differences 
due to the more complex MBLR model. First, as 
mentioned above, these estimates have been adjusted 
for differential covariate distributions across stud- 
ies. Second, the Pollakiuria estimates here have been 
shrunk toward the prior mean estimates of the odds 
ratios involving all responses. Third, the shrinkage of 
interaction estimates toward 0, governed by r in (6), 
is similar to the shrinkage toward a common mean 
effect that occurs in a random effects meta-analysis. 
Fourth, the weight that each study contributes to 



the overall estimate is governed by a more complex 
formula than in either the standard fixed or ran- 
dom effects meta-analyses. However, it does share 
with the random effects methodology the fact that 
relative weights are much attenuated compared to 
relative sample sizes. Finally, this more complex cal- 
culation means that the single-study treatment es- 
timates in the above MBLR graph do not preserve 
the between-study differences, as might be shown in 
a standard meta-analysis presentation. 

The response Pollakiuria was chosen as the exam- 
ple for Display 6 because that issue showed a greater 
Treatment-by-Study effect than other issues: for ex- 
ample, in Display 6 the Trt Study:Al effect is 1.33, 
while the Trt Study:B4 effect is 0.79, for a ratio of 
1.68, and the two 90% intervals barely overlap. Is 
this post-hoc selection legitimate? Clearly, this way 
of finding "interesting" results is biased in many 
standard settings. However, the Bayesian shrinkage 
methodology tends to offset such biases, as will be 
seen in the simulation results to follow. 

6. SIMULATION STUDY OF MBLR AND RLR 

The statistical properties of MBLR are studied 
using a simulation of the model that MBLR as- 
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sumes. The purpose is to compare the accuracy of 
the MBLR results with that of the RLR results in 
the context of a situation like that in the example 
of Section 5, where there are rare events and sparse 
data. The simulation emulates that example in the 
sense that the distribution of subject covariates and 
treatment assignment matches the data in Section 5 
exactly. Also, the list of response issues is the same 
and the baseline probabilities (as measured by the 
intercept term in the logistic regressions) of each re- 
sponse in the simulation are similar to that in the 
data of Section 5. The protocol for each simulation 
involves the following steps: 

1. Set the K intercept term values aofc) one for 
each of the responses. 

2. Set the G + 1 prior means Ai,A2, . . . , Aq, Bq. 

3. Set the four prior standard deviations (j) = {cta, 
<7o,o-b,t). 

4. Repeat steps (5 through 12) iVsiM times: 

5. Draw {ogk} from N{Ag,a'j^), g = l,...,G; 
k = l,...,K. 

(Note: all random variable generation is 
performed using built-in R functions. Also, 
constraints that agk must sum to as (7 
varies over the categories of each single co- 
variate are enforced by subtracting means 
over the corresponding covariate from the 
originally drawn Og^. An analogous proce- 
dure is used in steps 7 and 8.) 

6. Draw {/3ofe} from N{Bo,a^), k = l,...,K. 

7. Draw {Bg} from N{0,t^), g = I, . . . ,G. 

8. Draw {^gk} from N{Bg,al), g = l,...,G; 
k = l,...,K. 

9. For each set of subjects having the same 
covariate values and treatment assignment, 
compute Zj/c and Pj^ using (1) and (2), i = 
1, . . . , m; k = 1, . . . , K . 

10. Draw {Nik} from binomial {ni,Pik), i = 
1,. . . ,m; k = 1, . . . , K . 

11. Fit both the MBLR and the RLR model to 
the counts {Ni^}. 

12. Update cumulative summaries of estimation 
results for each simulation as described be- 
low. 

13. Create reports summarizing the estimation ac- 
curacy of the two methods regarding all param- 
eters. 

Simulation Summary Statistics 

There are M = 2{G + 1){K + 1) — 1 parameters be- 
ing estimated and two estimation methods: MBLR 
and RLR, so the total number of estimators being 



evaluated is -R = 2M. For simulation s (.8 = 1,..., 
-^sim) and for estimate r {r = 1, . . . , R), let: 

9rs = true value of parameter r for simulation s, 
as defined by steps 1, 2, 5, 6, 7 and 8, 
Qrs = estimated value (posterior mean) of param- 
eter r for simulation s, 

SBrs = estimated SE (posterior standard devia- 
tion) for parameter estimate r for simulation s, 
BIASr- = X^s(?rs — drs)/NsiM [average estimation 
error] , 

RMSEr = y/iYlsilrs - Ors)'^/NsiM) [square root 

of mean squared estimation error] , 

■^r = {Y^si'lrs-Orsf/sejs)/NsiM [average squared 

standardized estimation error], 

CI.05r = (# times qrs + 1.645sers < OrsJ/NsiM 

[proportion of times 90% CI is too low] , 

CI.95r = (# times qrs — 1.645sers > ^rs)/A'^3iM 

[proportion of times 90% CI is too high] . 

These summary statistics focus on the estimation 
accuracy of Qrs and also on the calibration accuracy 
of scrs- We want BIAS and RMSE to be as close to 
as possible, we want to be near 1, and we want 
CI. 05 and CI. 95 to be near 0.05. The R estimates 
can be grouped by the two methods, the {K + 1) 
responses (counting PRIOR_MEAN general- 
ized response) and the 2G + 2 different term defi- 
nitions. The term definitions fall into three general 
term types: 

COY = {Ag, agk}, 
TREAT = {Bo, /3ofc}, 
TRT* COY = {Bg,f3gk}. 

We can summarize the simulation of the R es- 
timates by averaging the six accuracy summaries 
listed above over groups defined by method, response 
and/or term type. 

Finally, for the MBLR, we can summarize the pos- 
terior means and standard deviations of the four es- 
timated prior standard deviations. 

Simulation Design 

The simulations are designed to compare varia- 
tions in three design factors, each at two levels, so 
that 8 separate simulations were performed, with 
each simulation having A'^giM = 250 replications, and 
so that a simple comparison of the two levels of each 
factor will be based on 1000 replications at each 
level. The three design factors correspond to two 
different choices at each of the first three steps in 
the simulation protocol given above: 
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Response 


Intercept 


Base.Prob 


1. Hyper kalaemia 


-2.906 


0.0547 


2. Thirst 


-3.333 


0.0357 


3. Dry mouth 


-3.429 


0.0324 


4. Pollakiuria 


-3.645 


0.0261 


5. Polyuria 


-4.787 


0.0083 


6. Nocturia 


-5.646 


0.0035 


7. Polydipsia 


-5.819 


0.0030 


8. Micturition urgency 


-6.618 


0.0013 


9. Urine output increased 


-6.972 


0.0009 


10. Anuria 


-7.722 


0.0004 



Display 7. Estimated values of intercept terms aok that 
are used in the simulations. When K — 5, only the first 5 
responses m the display are used. The baseline probabilities 
are defined as exp(aofc), which range from 0.055 for Hyper- 
kalaemia to 0. 00044 for Anuria. 



Factor 1, Level 1: Frequent responses only (most fre- 
quent 5 in the example data). 

Factor 1, Level 2: Both frequent and rare responses 
(all 10 issues in the example data). 

The K = 10 situation uses the same 10 issues 
as in the example of Section 5, with the values 
of the intercept terms aok set equal to the esti- 
mated values from the real data, shown in Dis- 
play 7. The baseline probabilities are defined as 
exp(Q:ofc), also shown, which range from 0.055 
for Hyperkalaemia to 0.00044 for Anuria. When 
K = 5, the most frequent 5 response issues are 
used, as shown in rows 1-5 of Display 7. 

Factor 2, Level 1: Average of main effects = {Ag = 
for all g, Bo = 0). 

Factor 2, Level 2: Prior Means of main effects rela- 
tively large nonzero values. 

For Level 2, the values of , . . . , Ag,Bq are set 
to two times the values estimated in the analy- 
sis of the actual data. Display 8 shows the co- 
efficient values as estimated and as used in the 
simulations. 

Factor 3, Level 1: aA = 0.4, ctq = 0.6, ctb = 0.2, r = 

0.2 (Small PSDs). 
Factor 3, Level 2: aA = 1-0, ctq = 1.2, = 0.8, r = 

0.8 (Large PSDs). 
The Level 1 values of prior standard deviations 

are similar to those estimated from the example 

data, while the Level 2 values are significantly 

larger. 

All simulations create 5752 subjects having the 
same joint distribution of covariates and treatment 
allocations as the actual data and as summarized 



Term 


Estimated 


Level 1 


Level 2 


Gender: F 


0.028 





0.056 


Gender: M 


-0.028 





-0.056 


Study: Al 


0.094 





0.188 


Study: A2 


0.529 





1.058 


Study: A3 


-0.325 





-0.65 


Study: A4 


0.33 





0.66 


Study: Bl 


0.293 





0.586 


Study: B2 


-0.232 





-0.464 


Study: B3 


-0.273 





-0.546 


Study: B4 


-0.417 





-0.834 


RenalHistory: N 


-0.187 





-0.374 


RenalHistory: Y 


0.187 





0.374 


Age: 50 and under 


-0.251 





-0.502 


Age: 51-65 


0.109 





0.218 


Age: 66-75 


0.239 





0.478 


Age: over 75 


-0.097 





-0.194 


Treatment 


1.484 





2.968 



Display 8. Prior means for the main effects, as estimated 
by MBLR from the real data, and as varied m the simulations, 
either all zeros (Level 1), or set to twice the estimated values 
(Level 2). 



in Display 2. Thus, G = 16 and M = 203, R = 566 
when K = h, while M = 373, R = 1066 when K = 10. 

Simulation Results 

Display 9 shows summaries of the distributions 
of (square roots of) variance component estimates, 
which are denoted PSDs for prior standard devia- 
tions in the model equations (3)-(6). Since there are 
4 separate PSDs in the model, and the simulations 
are run at two sets of PSDs, the scales of all the 
PSDs in Display 9 have been normalized by divid- 
ing each estimated PSD and each estimated sam- 
pling standard deviation by the true PSD used in 
the corresponding simulation. Thus, a value of 1 for 
an average estimated PSD in Display 9 is interpreted 
as an unbiased estimate, and a value of 0.1 for the 
standard deviation of the sampling distribution of 
a PSD in Display 9 is interpreted as a coefficient of 
variation of 10%. 

Display 9 has 8 columns and 7 rows. There are 
4 pairs of columns, corresponding to the sampling 
means and standard deviations of the estimates of 
each of the four PSDs in the model. The 7 rows of 
Display 9 correspond to different subsets of the 2000 
simulations. Row 1 shows averages over all simula- 
tions, whereas the other rows show averages over 
a subset of 1000 simulations corresponding to the 
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0"A 






SDcro 






T 


SD^ 


All MBLR simulations 


L035 


0.130 


1.005 


0.271 


0.988 


0.236 


1.088 


0.368 


Responses: Frequent 


1.044 


0.144 


1.004 


0.283 


1.004 


0.253 


1.073 


0.373 


Responses: Freq + Rare 


L026 


0.116 


1.007 


0.260 


0.971 


0.219 


1.102 


0.363 


Mean effects: Zero 


1.034 


0.133 


1.005 


0.276 


1.000 


0.245 


1.091 


0.376 


Mean effects: Large 


1.035 


0.126 


1.006 


0.267 


0.975 


0.227 


1.084 


0.360 


Prior SDs: Small 


1.037 


0.153 


1.131 


0.353 


0.954 


0.340 


1.136 


0.476 


Prior SDs: Large 


1.033 


0.106 


0.879 


0.190 


1.022 


0.132 


1.039 


0.259 



Display 9. Summary of estimation of prior standard deviations (PSD) in the MBLR simulations. All estimated PSDs are 
divided by the true PSD to put their sampling distributions on a common scale. The row "All Simulations" shows means 
and standard deviations of normalized estimates across all 2000 simulations. Other rows show results for subsets of 1000 
simulations broken down by the two levels of each of the three design factors in the experiment. See text for explanation of the 
design factors and their levels. 



levels of each of the factors in the experimental de- 
sign. For example, consider the columns labeled ctq 
and SD(To in Display 9. In row 1, 1.005 implies that 
overall the mean of estimates of the Treatment PSD 
are within 0.5% of the true value, and the next value 
of SDo"o = 0.271 implies that individual estimates 
are typically about 27% off the true value. Of course 
that value is principally reflective of the sample size 
and the experimental design of the clinical studies. 
All simulations used the same clinical study setups, 
but there was variation according to the three fac- 
tors in the simulation. Going down the rows in these 
same two columns, we see that estimates of do had 
almost exactly the same means and standard devia- 
tions whether all ten responses were being simulated 
or whether just the most frequent five responses 
were simulated. Similarly, the next two rows show 
that there was virtually no difference in the sam- 
pling means and standard deviations of <to between 
the situation where the average effects are about 
versus relatively large effects. However, the final two 
rows of the Display show that when all four PSDs are 
small {a A = 0.4, do = 0.6, as = 0.2, r = 0.2) the es- 
timate of (To is biased upward about 13%, and when 
all four PSDs are large [a a = 1-0, (Tq = 1.2, ub = 0.8, 
T = 0.8) it is biased downward by about the same 
percentage. The direction of the biases implies that 
estimates tend to be somewhat more central with re- 
spect to the restricted range imposed (0 < ctq < 1.5) 
than the true value, thus moderating the estimates. 
The coefficient of variation of ctq is about 35% in the 
former case and about 19% in the latter case. This 
corresponds to roughly the same standard deviation 
of the estimate of (Tq whether o"o is 0.6 or 1.2. 

This effect only shows up with respect to cjo; the 
other columns in Display 9 show that mean esti- 
mates of a A, ctb and r are relatively unaffected by 



any of the three factors in the simulation, especial- 
ly a A and as- Consideration of degrees of freedom 
may explain this — these two variance components 
have {G — J){K — 1) degrees of freedom, whereas 
(To has K — 1 di and r has G — J df, so one might 
expect them to be harder to estimate (although the 
definition of degrees of freedom is somewhat fuzzy 
in this nonlinear Bayesian setting). Estimates of r 
seem to be most variable percentagewise, with co- 
efficient of variation in the 30-40 percent range. In 
all cases the coefficient of variation is larger for the 
smaller true PSDs. The standard deviation of esti- 
mation decreases when the true PSD decreases, but 
not fully proportionally. 

However, remember that the goal of the analysis is 
not to estimate the variance components per se, but 
to use them to define a model that can better esti- 
mate the logistic coefficients by adjusting to global 
patterns in the data across responses and predic- 
tor categories. Each individual estimation does not 
assume that the PSDs are exactly equal to their pos- 
terior mean, but rather the estimation involves an 
integration across the posterior distribution of the 
PSDs. In that respect, it is interesting to examine 
the posterior standard deviations of the PSDs. They 
have not been included in Display 9 in order to save 
space, but in fact the average of the posterior stan- 
dard deviations across simulations was remarkably 
similar to the sampling standard deviations of the 
posterior means of each PSD. They typically differed 
by only 10% or so for each of the 8 sets of 250 simula- 
tions. Thus, our model expects that the PSDs will be 
hard to estimate and works within that uncertainty. 

Estimation of Logistic CoefFicients 

Display 10 summarizes the simulation distribu- 
tions of the various logistic regression coefficients. 
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(a) Treatment effect prior mean Bq Treatment effect for responses /3ofc 







rTZ 

Z 


C1.05 


LI. 95 


KJViaJii 




Li. 05 


CI. 95 


All RLR simulations 


0.719 


0.192 


0.000 


0.003 


1.066 


26.026 


0.108 


0.354 


All MBLR simulations 


0.383 


1.167 


0.056 


0.070 


0.466 


1.248 


0.061 


0.074 


Responses: Frequent 


0.424 


1.235 


0.068 


0.067 


0.314 


1.200 


0.059 


0.073 


Responses: Rare 


0.343 


1.099 


0.044 


0.073 


0.619 


1.297 


0.063 


0.075 


Mean effects: Zero 


0.386 


1.163 


0.046 


0.076 


0.491 


1.284 


0.052 


0.090 


Mean effects: Large 


0.381 


1.170 


0.066 


0.064 


0.442 


1.212 


0.070 


0.058 


Prior SDs: Small 


0.286 


1.023 


0.043 


0.062 


0.375 


1.217 


0.058 


0.073 


Prior SDs: Large 


0.481 


1.310 


0.069 


0.078 


0.557 


1.280 


0.064 


0.076 



(b) Covariate effect prior means Ag Covariate effect for responses otgu 





RMSE 




CL05 


CI.95 


RMSE 




CI.05 


CL95 


All RLR simulations 


0.490 


0.105 


0.000 


0.000 


0.819 


11.662 


0.166 


0.190 


All MBLR simulations 


0.297 


0.972 


0.044 


0.052 


0.373 


1.041 


0.049 


0.057 


Responses: Frequent 


0.323 


0.959 


0.043 


0.052 


0.280 


1.041 


0.049 


0.056 


Responses: Rare 


0.272 


0.986 


0.045 


0.052 


0.466 


1.042 


0.049 


0.057 


Mean effects: Zero 


0.300 


0.959 


0.042 


0.051 


0.388 


1.026 


0.047 


0.056 


Mean effects: Large 


0.295 


0.986 


0.046 


0.053 


0.358 


1.056 


0.051 


0.057 


Prior SDs: Small 


0.205 


0.990 


0.044 


0.055 


0.283 


1.048 


0.050 


0.057 


Prior SDs: Large 


0.390 


0.954 


0.043 


0.049 


0.463 


1.034 


0.048 


0.057 



(c) Interaction effect prior means Bg Interaction effect for responses /3gfc 





RMSE 


Z' 


C1.05 


CL95 


RMSE 


Z^ 


C1.05 


C1.95 


All MBLR simulations 


0.347 


3.189 


0.151 


0.144 


0.346 


1.116 


0.059 


0.057 


Responses: Frequent 


0.353 


2.940 


0.144 


0.141 


0.292 


1.110 


0.059 


0.056 


Responses: Rare 


0.340 


3.439 


0.159 


0.147 


0.399 


1.122 


0.059 


0.057 


Mean effects: Zero 


0.347 


3.114 


0.150 


0.140 


0.360 


1.120 


0.060 


0.057 


Mean effects: Large 


0.346 


3.265 


0.152 


0.148 


0.331 


1.112 


0.059 


0.056 


Prior SDs: Small 


0.171 


2.489 


0.130 


0.122 


0.203 


1.174 


0.062 


0.061 


Prior SDs: Large 


0.522 


3.890 


0.173 


0.166 


0.488 


1.058 


0.056 


0.053 



Display 10. Summary of estimated logistic coefficient distributions within the simulations. Separate subtables for (a) treat- 
ment effects Bo and Pok, (b) covariate main effects Ag and ctgk, (c) treatment-by-covariate interactions Bg and Pgk 
(g — 1, . . . ,G). See text for explanation of the summary statistics. 



Part (a) of Display 10 focuses on the main effect of 
Treatment. The first two rows of Display 10(a) com- 
pare the Treatment effect accuracy of the RLR esti- 
mates to that of the MBLR estimates. The first four 
columns refer to the estimation of the prior mean co- 
efficient, Bq, what might be called the "all response 
summary," while the last four columns refer to the 
estimation of coefficients, /Jofcj for the individual re- 
sponses. Across all 2000 simulations, the RMSE for 
RLR is almost double that of MBLR for estima- 
tion of Bq, and more than double, on average, for 
estimating the /3ofc . Since statistical efficiency is typ- 
ically inversely proportional to the square of RMSE, 
this implies that MBLR is about 4 times as efficient 
as RLR at estimating treatment/comparator odds 
ratios in this setting. 



The statistic Z'^ is designed to measure the cal- 
ibration of the posterior standard deviations com- 
puted by a method to the actual sampling distribu- 
tion, where = 1 implies perfect calibration. When 
^ 1, the claimed standard errors of coefficients 
are too optimistic (too small), and the reverse is 
true when <C 1. The values of provide similar 
information to the counts of times confidence inter- 
vals fail to enclose the true values of coefficients. 
When is too large and putative standard errors 
are too small, the too-short confidence intervals will 
miss the true values more than the nominal percent 
of times, and conversely. Looking at the first two 
rows of Display 10(a), we see that RLR is poorly 
calibrated in this sense. Computed standard errors 
are too large for the all-response summary and too 
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small for the individual response treatment effects. 
As a result, supposedly 90% confidence intervals had 
99.7% coverage for the all-response summaries and 
only 53.8% coverage for individual response treat- 
ment effects. In contrast, the MBLR estimates are 
much better calibrated, with about 1.2 and nomi- 
nally 90% intervals having coverage probabilities av- 
eraging about 87%. 

The remaining rows of Display 10(a) show the be- 
havior of the MBLR estimation for subsets of simu- 
lations defined by the three two- level factors. Rows 3 
and 4 compare results for simulations with the 5 
more frequent responses to those for the 5 less fre- 
quent responses. In the latter case, although the 
runs generated all 10 responses and all 10 were used 
in the analysis, the results in the row labeled "Re- 
sponses: Rare" are based only on accuracy statistics 
for the 5 least frequent responses, in order to bet- 
ter isolate the estimation ability of MBLR for rare 
events. We see that in fact the RMSE, Z^, and 90% 
interval coverage probabilities are roughly the same 
for the rare and frequent events. (Of course, we as- 
sume that a run with only the five rare events would 
lead to much more variable estimation — it is the 
ability of the Bayesian algorithm to detect and mea- 
sure similarities between frequent and rare events, 
and to "borrow strength" appropriately, that allows 
such accuracy.) The next two rows of Display 10(a) 
show that whether the true prior means are or 
not makes no difference in the estimation properties. 
The final two rows of Display 10(a) show that esti- 
mation is significantly more accurate when PSDs are 
small than when they are large, which make sense, 
because small PSDs imply more commonality across 
the responses, and thus more opportunity to borrow 
strength and increase estimation accuracy. But even 
with the larger set of PSDs, MBLR quite outper- 
forms RLR. 



Display 10(b) shows the corresponding results for 
the estimation of covariate main effects. For this ex- 
ample, the estimation of Ag and a^fc seems to be 
more accurate, using either RLR or MBLR, on av- 
erage for the 16 covariate effects (indexed by g) than 
it was for the single main effect of treatment. How- 
ever, the advantage of MBLR over RLR is about 
the same, both in terms of RMSE and in terms of 
standard error calibration as measured by Z^ and 
the coverage probabilities of nominal 90% intervals. 

Display 10(c) shows the simulation accuracy of 
estimation of covariate-treatment interaction coeffi- 
cients. Since the RLR model does not estimate inter- 
actions, only MBLR results are presented. Looking 
first at the right-hand set of four columns in Dis- 
play 10(c) that refer to estimation of the /3gk, all 
four accuracy measures seem to mimic the values 
in Display 10(b) — it appears that MBLR can es- 
timate individual covariate-treatment interactions 
as accurately as the main effects of covariates. Now 
looking at the first four columns of Display 10(c), 
where the Bg are being estimated, the entire col- 
umn of RMSE values are about the same as in Dis- 
plays 10(a) and 10(b), so the posterior mean esti- 
mates of the Bg are about as accurate as those of Bq 
and of Ag. However, it seems that the posterior stan- 
dard deviations of the Bg are too optimistic, since 
the values of Z^ are about 3 times too large, and 
the error rate of the corresponding nominal 90% in- 
tervals is about 30% instead of 10%. This result is 
puzzling and awaits further investigation. 

Bayesian Shrinkage Estimates are Resistant to 
the Multiple Comparisons Fallacy 

Display 11 shows the remarkable power of Bayesian 
shrinkage estimates to avoid bias even in the pres- 
ence of post-hoc selection of the most significant of 
many estimates. For each simulation, the task is 
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0.985 
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0.078 
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0.061 
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0.972 


0.011 


0.171 


1.312 


0.074 


0.068 


Mean effects: Large 


0.980 


-0.004 


0.175 


1.315 


0.065 


0.071 


Prior SDs: Small 


0.337 


0.019 


0.163 


1.600 


0.084 


0.093 


Prior SDs: Large 


1.616 


-0.011 


0.184 


1.027 


0.055 


0.046 



Display 11. Simulation of the resistance to multiple comparisons bias of MBLR. At each simulation, the most significant 
treatment x covariate interaction was singled out across all responses by selecting the largest of the GK values (G = 16, A' = 5 
or 10, GK — 80 or 160) of (estimated interaction coefficient)/ (estimated posterior s.d. of coefficient). The MBLR estimates 
are unbiased with relatively small RMSE. See text for discussion of other columns. 
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to find the most significant treatment x covariate 
interaction among all the responses. There are 16 
covariate-based subsets in the model that get inter- 
action estimates for every response variable and K = 
10 or 5 responses, making a total of 16K = 80 or 160 
ratios (estimated interaction coefficient) / (estimated 
s.e. of interaction coefficient). The largest ratio (one- 
sided alternative) in each MBLR analysis is selected 
and then the known true value for the selected in- 
teraction is used to compute the accuracy measures. 
This selection and assessment is repeated for each 
of the 2000 simulations. The first column of Dis- 
play 11 is the average of the true coefficients for the 
selected interactions. Remembering that the true in- 
teractions are generated from a N{0,a'^ + r^) dis- 
tribution, where ^ (cr^ -|- r^) = either 0.283 (smaller 

PSDs) or 1.13 (larger PSDs), it is clear from the 
"True Int" column that MBLR is selecting fairly 
large interactions. The column headed BIAS con- 
tains the average difference between the selected es- 
timate and its true value. Remarkably, the MBLR 
post-hoc selections have virtually no bias, either over- 
all or in any of the six factor-based subsets. The final 
four columns in Display 11 show the same accuracy 
measures as those of Display 10. The RMSE values 
in Display 11 are smaller than any of those in Dis- 
play 10, which at first might seem surprising, but is 
a consequence of the fact that the maximum of 80 
or 160 identically normally distributed variates will 
have smaller standard deviation than a single such 
variate, due to the short tail of the normal distribu- 
tion. The calibration of the posterior standard de- 
viations of the selected most significant interaction, 
as measured by the values of Z'^ and the coverage 
probabilities, is not perfect but is similar to that of 
the treatment main effects in Display 10(a). This 
excellent accuracy of MBLR shrinkage estimates in 
the face of post-hoc selection is in spite of the fact 
that the variance components which determine the 
amount of shrinkage were not known in advance but 
were estimated separately for each simulation. 

Discussion of the Simulation 

It should not be surprising that data generated by 
a specific Bayesian model can be better analyzed by 
fitting that model. But these simulations show that 
there is a surprisingly large advantage to doing so, 
and that you give up a lot of efficiency (equivalently 
waste clinical resources) by forgoing such an analysis 
if, in fact, such a model is realistic. With the RLR 
approach, which itself is probably a more efficient 
analysis than straight logistic regression, you give 



up the possibility of estimating treatment-covariate 
interactions and yet still lose accuracy in estimation 
of main effects. The principal nonBayesian alterna- 
tive is to form a single pooled response, treating the 
different issues as equivalent. But then you don't 
even get estimates for the separate issues and you 
would be submerging completely the medical dis- 
tinction between, say, such a serious adverse event 
as Anuria and Dry mouth or Thirst. Our methodol- 
ogy is a "Goldilocks alternative" to the bias- variance 
trade-off, neither as variable as estimating so many 
parameters with no prior shrinkage, nor as biased as 
assuming that all issues have the same response to 
treatment and that all interactions are 0. 

7. SUIVIMARY AND CONCLUSIONS 

Safety issues with low observed frequencies will 
produce standard logistic regression estimates with 
wide confidence intervals (based on highly variable 
sampling distributions). Clinical safety data is often 
of very fine granularity. Each observation of a sub- 
ject's adverse event is described with great precision, 
providing a great multiplicity of events to be tabu- 
lated and whose event frequencies must be compared 
across treatment arms. Defining event groupings for 
the purpose of getting pooled events with more reli- 
able relative frequencies is hard to do in advance, be- 
fore the set of somewhat frequent events is observed. 
After the data are collected, it can be controver- 
sial to lump events together because the selection of 
events to pool can determine how significant Treat- 
ment/Comparator odds ratios become. The multi- 
variate Bayesian logistic regression methodology de- 
scribed here is designed to be a compromise between 
separate analyses of finely distinguished events and 
a single analysis of a pooled event. It requires the 
selection of a set of medically related issues, poten- 
tially exchangeable with respect to their dependence 
on treatment and covariates. 

A key concept underlying the proposed methodol- 
ogy is that a set of K issues have been prespecified 
as important and likely to be biologically and clin- 
ically related. It would be a misuse of the method 
to try very many subsets of a large set of issues, 
stopping only when an "interesting" result is ob- 
tained. A similar caution pertains to selection of 
covariates — only those with some prior justification 
should be included. When too many extraneous co- 
variates are entered, the estimated variance compo- 
nents may lead to over-shrinking those effects and /or 
interactions that are present, and lead to overly nar- 
row confidence intervals. 
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The methodology is exploratory in nature, in that 
the analyst is encouraged to examine the relation- 
ship of the adverse event frequencies to multiple 
covariates and to treatment by covariate interac- 
tions. These more complicated models may not be 
estimable by a standard logistic regression algorithm 
because the data are often too sparse for the num- 
ber of parameters being estimated. Two strategies 
are used to cope with this sparsity. First, a Bayesian 
model allows the analysis of each issue to borrow 
strength from the other issues, assumed medically 
related so that this sort of averaging is not unrea- 
sonable. The fitting of the MBLR model is accom- 
plished by the multiple runs of a maximum like- 
lihood algorithm, together with the estimation of 
Bayes factors for a range of values of the unknown 
variance components. The MBLR algorithm is in- 
tended to be able to measure the degree to which the 
issues have similar main effects and interactions with 
treatment on the logit scale. The hierarchical prior 
specification in equations (3)"(5) allows for partial 
averaging across issues for those model coefficients 
that seem similar. There is also a tendency for the 
treatment x covariate interaction coefficients to be 
shrunk toward the null value of 0, to an extent con- 
trolled by an estimated EB variance parameter as 
in (6). This shrinkage is intended to offset the ten- 
dency of exploratory methods to find "significant" 
subgroup effects purely by chance. Second, a com- 
parison method, denoted regularized logistic regres- 
sion, sets particular values of the variance compo- 
nents in the empirical Bayes model to emulate stan- 
dard logistic regression (without interactions) while 
avoiding computational problems and inestimable 
effects that can be caused by low counts. This mod- 
ification is designed to hardly affect the estimates 
from standard logistic regression when the data are 
not sparse. 

Since treatment-by-covariate interaction coeffi- 
cients are difficult to interpret, the sums of the treat- 
ment main effect plus the covariate interactions are 
also presented and interpreted as the estimated treat- 
ment effect that would hold for subjects in the sub- 
group identified by the covariate value. This com- 
bined effect is apropos to the search for a subject 
subgroup that might be particularly vulnerable to 
an adverse reaction to treatment. 

The goal is to allow safety review of a large amount 
of clinical data using a sophisticated methodology 
that can nevertheless be mastered by those without 
advanced training in Bayesian methods or the the- 



ory of variance component estimation, or the inter- 
pretation of large masses of sparse data. Section 5 
shows an example partial analysis of 10 medically 
related issues within a pool of 8 studies involving 
over 5700 subjects with a model involving treat- 
ment, 5 covariates involving 13 defined covariate 
values, and including examination of treatment-by- 
covariate interactions. Section 6 describes a simula- 
tion study that measures the large gains in efficiency 
that MBLR can attain, compared to separate analy- 
ses for each issue. The striking results in Display 11 
show the ability of Bayesian modeling to greatly re- 
duce bias due to post-hoc selection of the most sig- 
nificant contrast. 

The Multivariate Bayesian Logistic Regression is 
a technique that can add to the tools available to the 
data analyst or medical reviewer. The method does 
not eliminate the need for experimental replicabil- 
ity and convergence with medical knowledge. A sig- 
nificant Bayesian result found in one sample that 
is not replicable may just be indicative of a sam- 
pling problem. With that said, it is hoped that this 
new tool will ease the burden of seeing the forest 
for the trees during the analysis of clinical safety 
data. 
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